This document is a record of my book readings, an exercise in RMarkdown, and procrastinating material.

books <- fread("C:/Users/User/OneDrive - London School of Hygiene and Tropical Medicine/Documents/books/Book1.csv")
b <- select(books, c(1:ncol(books)))
names(b) <- c("name", "start", "end", "days", "rating","type", "genre","review","link")

# next step make sure 1 day book show up as something, easy add 1
b <- b %>%  
  mutate(across(c(start,end),~ as.Date(.x, format = "%d/%m/%Y" )), logdays = log(days)+0.5, lograting = log(rating)) %>%
  filter(!is.na(days)) %>%
  mutate(type = as.factor(type), genre = as.factor(genre), reviewed = case_when(review != "" ~ link, review == "" ~ ""))

According to reputable sources, the country that reads the most is India with 10 hours per week, the US counts a measly six hours and Japan and Korea boast an honest four and three hours respectively. In Europe, 80% of the inhabitants of Luxembourg, only half of which are Luxembourgers, read at least one book a year. Only 30% of their fellow european-unioners in Romania claim to achieve such reading rates.

An even more reputable source has found that the average US adult reads 12 books a year. Now this seems like a bit much, but of course it does. On the one hand, surveys*, on the other, most people don’t read much at all and some other people read much more than 12 books a year resulting in very different mean and median statistics, and remember, surveys*.The average person is much more likely to read close to 4 books a year.

I have been keeping an imperfect record of the books I have read and listened to “cover to cover” since 2017 now in the formatted in the table below, some have accompanying review links.

Complete Data Table
dt <- b %>% select(-logdays, -lograting, -review,-link)  %>% 
  DT::datatable(b, filter = "top") # try next reactable functions

dt

Before we do anything else. Here is a plot to hover over, can you find the Harry Potter cluster?

p <- b %>%
      ggplot(aes(end,rating, 
                 fill = type, stroke = .3, label = name, duration = days, review = reviewed)) +
      geom_jitter(width  = 0.45, height = 0.45, 
                  size = b$logdays, na.rm = FALSE) +
      scale_fill_viridis(discrete = TRUE) +
  xlab("date finished") +
  ggtitle("Reading timeline by rating, type and time taken")+
      theme_bw()
      

ggplotly(p, tooltip = c("name", "rating", "days", "review"))

and more

How do I compare to the average book-enjoyer?
What type and genre of book do I like the most?
Do I exhibit seasonal patterns?

yavg <- b %>% filter(year(end) != year(today())) %>% group_by(year(end)) %>% 
  summarize(n_books = n(), sum_days = sum(days), rating = round(mean(rating),2))

yavg %>%
  reactable(.,
    defaultSorted = "n_books",
    defaultSortOrder = "desc",
    theme = fivethirtyeight(),
    columns = list(
      n_books = colDef(
      style = color_scales(.)
    ),
    sum_days = colDef(
      style = color_scales(.)
    ),
    rating = colDef(
      style = color_scales(.)
    ))
  ) %>% 
    add_subtitle("yearly counts")

yearly counts

avgyear <- mean(yavg$n_books)

My suspicions are confirmed, I have read more books than the average person. With an average of 12.6 This makes sense because I really like reading books, I do it for fun when I have time.

tabt <- pivot_wider(as.data.frame(table(b$type,year(b$end))),
                   id_cols = "Var1", names_from = "Var2", values_from = "Freq") %>% 
  rename(Year = Var1) %>%
  rowwise(.) %>% mutate(Total = sum(c_across(where(is.numeric)))) %>% ungroup() %>%
  mutate(Rating = round(c(mean(b[type == "Essay",rating]),
                    mean(b[type == "Fiction",rating]),
                    mean(b[type == "Non-Fiction",rating]),
                    mean(b[type == "Short Story",rating])
                    ),2))
  
reactable(tabt, theme = fivethirtyeight()) %>% add_subtitle("book types read")

book types read

tabg <- pivot_wider(as.data.frame(table(b$genre,year(b$end))),
                   id_cols = "Var1", names_from = "Var2", values_from = "Freq") %>%
  rename(Year = Var1) %>%
  rowwise(.) %>% mutate(Total = sum(c_across(where(is.numeric)))) %>% ungroup() %>%
     
  mutate(Rating = round(c(mean(b[genre == "Biography",rating]),
                    mean(b[genre == "Comedy",rating]),
                    mean(b[genre == "Creative Nonfiction",rating]),
                    mean(b[genre == "Epic",rating]),
                    mean(b[genre == "Fantasy",rating]),
                    mean(b[genre == "Magical Realism",rating]),
                    mean(b[genre == "Paranoid",rating]),
                    mean(b[genre == "Political",rating]),
                    mean(b[genre == "Realist",rating]),
                    mean(b[genre == "Reporting",rating]),
                    mean(b[genre == "Science Fiction",rating]),
                    mean(b[genre == "Tech",rating]),
                    mean(b[genre == "Tragedy",rating]),
                    mean(b[genre == "Western",rating])
                    ),2))

reactable(tabg, theme = fivethirtyeight()) %>% add_subtitle("book genres read")

book genres read

I read far more fiction books, especially in the last two years, than anything else.This is not because I rate fiction books higher than all others, the average ratings by type are very close.

A better interactive timeline?

ts <- b %>% mutate(id = rep(1:floor(avgyear), length.out= nrow(b))) %>% 
  pivot_longer(c(start,end), names_to = "key", values_to = "value") %>% 
  mutate(as.Date(value), xaxis = lubridate::yday(value)) #%>% filter (year(value) == 2019)

p <- ggplot(ts, aes(x =value, y = id, color = rating)) + 
  geom_point(aes(y = id), pch = 4) +
  geom_path(group  = ts$name)+
  scale_x_date(name = "time")+
  theme_bw()+
  theme(axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank())


 ggplotly(p, dynamicTicks = TRUE, width = 920) %>% 
  layout(
    xaxis = list(rangeslider = list(bgcolor = "#A9B0D6", bordercolor = "#000000", borderwidth = 1, thickness = 0.08)),
         yaxis = list(fixedrange = TRUE))

Scatterplot of rating and time taken to read faceted by year

p <- ggplot(b, aes(x = logdays, y = rating, colour = type)) +
  geom_jitter()+
  scale_x_continuous(breaks = seq(from= min(b$logdays),to =max(b$logdays), length.out = 5),
                     labels = function(x){round(exp(x))})+
  theme(axis.title.x = "days")+
  theme_bw()+
  facet_wrap(year(b$end))

ggplotly(p, width = 920)